What do Journalists do with Documents? Field Notes for Natural Language Processing Researchers
نویسنده
چکیده
Natural language processing and visualization systems have been proposed to help journalists analyze large sets of documents, but very little has been said on what journalists do with documents in practice. We review a collection of 15 stories completed with the Overview document mining platform, characterizing the source material and reporting tasks. The median document set contained 4,000 documents and the majority arrived as paper or scanned paper. In most cases journalists knew what they were looking for in advance, in contrast to the large research literature concerned with “exploring” a document set. We also review five cases where custom NLP techniques were used to produce a story, including applications of topic modeling, entity recognition, text classification, and sentiment analysis. Based on the cases in these two collections, we recommend six practice-driven themes for natural language processing researchers who want to assist journalists with large document sets: 1) Robust import. 2) Robust analysis. 3) Search, not exploration. 4) Quantitative summaries. 5) Interactive methods. 6) Clarity and Accuracy.
منابع مشابه
Plagiarism checker for Persian (PCP) texts using hash-based tree representative fingerprinting
With due respect to the authors’ rights, plagiarism detection, is one of the critical problems in the field of text-mining that many researchers are interested in. This issue is considered as a serious one in high academic institutions. There exist language-free tools which do not yield any reliable results since the special features of every language are ignored in them. Considering the paucit...
متن کاملروش جدید متنکاوی برای استخراج اطلاعات زمینه کاربر بهمنظور بهبود رتبهبندی نتایج موتور جستجو
Today, the importance of text processing and its usages is well known among researchers and students. The amount of textual, documental materials increase day by day. So we need useful ways to save them and retrieve information from these materials. For example, search engines such as Google, Yahoo, Bing and etc. need to read so many web documents and retrieve the most similar ones to the user ...
متن کاملRookie: A unique approach for exploring news archives
News archives are an invaluable primary source for placing current events in historical context. But current search engine tools do a poor job at uncovering broad themes and narratives across documents. We present Rookie: a practical soware system which uses natural language processing (NLP) to help readers, reporters and editors uncover broad stories in news archives. Unlike prior work, Rooki...
متن کاملWhat Do Iranian EFL Learners and Teachers Think of Teaching Impoliteness?
Every language involves friendly and polite as well as hostile and impolite situationsin which language users have to use the context-appropriate language. However,unlike politeness which has generated a great number of studies, few studies havebeen conducted on impoliteness especially in EFL contexts. The present study aimedto see whether language learners and teachers hold the same idea conce...
متن کاملWiddowson and Classroom Discourse
Drawing on recent developments in linguistic description and applied linguistics, it can be concluded that learning a language necessitates getting to know something and being able to do something with that knowledge: competence, and performance. Structural approach to language description attaches importance to the former; communicative approach to the latter. Appropriate classroom discours...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016